-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elasticsearch feature ✨💥 (Lombiq Technologies: OSOE-83) #11052
Conversation
src/OrchardCore.Modules/OrchardCore.Contents/Indexing/ContentItemIndexCoordinator.cs
Outdated
Show resolved
Hide resolved
src/OrchardCore.Modules/OrchardCore.Contents/Indexing/ContentItemIndexCoordinator.cs
Outdated
Show resolved
Hide resolved
src/OrchardCore.Modules/OrchardCore.Flows/Indexing/BagPartIndexHandler.cs
Show resolved
Hide resolved
466b7cf
to
55819e6
Compare
Squashed commits. |
Any news on this by chance? |
Ping me on Teams or else when you get time. |
Is there any problem with this PR? |
I'm about to start working on it. I'm still on vacation ... |
found some problem:
|
src/OrchardCore/OrchardCore.Search.Elastic.Core/ElasticQueryService.cs
Outdated
Show resolved
Hide resolved
But in fact, it is a wrong check |
Should we check the link status before each operation of ES? Or throw an exception every time the API link fails |
We need to prevent allowing to create indexes if we don't have a successfully working ES connection. |
Search feature modules changes explanations with sound : 2022-07-22.13-08-22.mp4 |
Awesome! Would it be possible to merge the Lucene and Elastic UX, to make Elastic a drop-in replacement for the search backend? Like how you manage Media files the same regardless they're stored locally or in Azure Blob Storage. |
They are complementary. Both can be used at the same time. But of course, you could use only Elastic Search if that's what you want instead of Lucene. Right now, there are discrepancies between both implementations. You can't use a Lucene Query and copy/paste it as an Elastic Query because everything is "analyzed" with Elastic Search. So a Term Query becomes a Match Query. But, to remove this discrepancy we would need to make the Lucene implementation work like Elastic Search which would take more time. I will take some more time to make sure we are doing the proper thing here by setting every document as "Analyzed" with Elastic Search. But the thing is that with Elastic Search everything is also always stored as I documented in the first post. So, right now, we can't just switch from Lucene to Elastic Search seamlessly unless we would keep the same ContentIndexSettings for both implementations. I removed the "Stored" option for Elastic because it was unnecessary. Also, another thing to take into consideration. We parse Lucene Queries with our own QueryParser implementation which follows Lucene 4.0 standards and Nest uses Lucene 7.x. not to mention that Lucene 8.x will use a newer implementation of Nest which is getting renamed to Lots of moving parts with Elastic compared with Lucene. Maybe it is better to keep Lucene and Elastic implementations unique for now. |
From this, it seems to me that if these two are independent features, we'll have two increasingly diverging Lucene-based search implementations. Why I see this as an issue because while Elastic offers a lot more than vanilla Lucene, I'd also consider it a "production" version of search, since otherwise, you need to employ workarounds in a production app unless you have a single instance and that's all to your environment (like with staged publishing, blue-green deployment, multi-node hosting). Contrast this with the Lucene module which just works without any external dependencies and is thus perfect for local development and testing. Again, like local storage-Azure Blob Storage: You can use both anywhere in principle, but most possibly you want to use local storage while you write the code, and simply switch to Blob Storage in production. I don't have any insights into how, when, or even whether it makes sense to tackle this, but wanted to share this viewpoint. |
@Piedone Does the Pull Request answer all your needs so far as a first iteration? On my side, I think it is ready as it is. There is only one comment on your side about the Migration class name which I won't take action on for now. I need to understand the reasoning behind this. Maybe @jtkech had some thoughts about this because of his tenant removal feature. I'm not sure if we should start having different migration files per module when there is only one feature in that module. One migration file per feature makes more sense. Else, if we want to get this merged I need to get approvals from the main contributors. So please add your comments/reviews. |
I will comment soon the related comment, hmm and another one I just saw related to an async call, maybe the last one first ;) |
@Skrypt I did 2 comments related to the async delegate and migration. Doesn't appear in the main thread, maybe I forgot to submit them as a review, so look at the file changes. |
Can't find it either. There is too much stuff in here. |
Once you've addressed all my comments, please re-request review. |
src/OrchardCore.Modules/OrchardCore.Search.Lucene/Services/LuceneIndexManager.cs
Show resolved
Hide resolved
//"OrchardCore_Elasticsearch": { | ||
// "ConnectionType": "SingleNodeConnectionPool", | ||
// "Url": "http://localhost", | ||
// "Ports": [ 9200 ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comma is still missing here.
src/OrchardCore.Modules/OrchardCore.Search.Elasticsearch/Views/Admin/Index.cshtml
Outdated
Show resolved
Hide resolved
src/OrchardCore.Modules/OrchardCore.Search.Elasticsearch/Views/Admin/Index.cshtml
Outdated
Show resolved
Hide resolved
src/OrchardCore.Modules/OrchardCore.Search.Elasticsearch/Views/Admin/Index.cshtml
Outdated
Show resolved
Hide resolved
...Core/OrchardCore.Search.Elasticsearch.Core/Deployment/ElasticIndexRebuildDeploymentSource.cs
Outdated
Show resolved
Hide resolved
src/OrchardCore/OrchardCore.Search.Elasticsearch.Core/Recipes/ElasticSettingsStep.cs
Show resolved
Hide resolved
…/Admin/Index.cshtml Co-authored-by: Zoltán Lehóczky <[email protected]>
Fix search bar and buttons
Lucene Settings for deployment steps.
Why did you merge, @sebastienros? As you can see above, I was reviewing, and after our last one I was waiting for Jasmin re-requesting my review. |
This Pull Request has become way too big to make any more reviews. There are 287 items hidden because this page is too big. Also, I think the Elasticsearch module is working well enough to merge it at least into the main branch so that people can try it and report issues. If there are any other issues we can open individual issues about them now. Also, I think it is ready enough to start using it. If you prefer @Piedone we can open a single issue about Elasticsearch where we will add a list of tasks to do. And I will open a new Pull Request if needed where we will be able to start fresh. |
I'm talking solely about reviewing changes that I requested last time. |
Yes, they were taken care of. I did not merge the "Synchronize Elasticsearch content index settings with Lucene." because somehow you should already know that you are in the Elasticsearch indices list. The other ones have been marked as resolved and are fixed. As I said, if there are issues remaining, nothing prevents us from creating new issues now. Else, maybe that's because you wanted to keep the OSOE-83 label? Also, at this point, if there was any major issue it should have been found and fixed. |
OK, but I didn't get a chance to check them before this was merged ;). I've been repeatedly reviewing this PR, including the changes you've made based on my feedback. So, before merging it, I'd have appreciated being able to do that for the last batch of changes too, to make sure that no issues slipped through (like it also happened immediately before that). In short, wait for approval before merge as usual. That is all. It's not about the label, nor about branching off new issues (what we've already done). |
True, we merged this on Tuesday's meeting with @sebastienros approval. Also because we are merging PRs that are set to be included in the 1.5 release this week so it needed to be merged.
I'm probably not understanding that last sentence. I did not see any new branch in here for fixing Elasticsearch issues yet? |
Also, I said at Tuesday's meeting that I was going to keep supporting this feature. If you guys have found any other issues and have branched off on your private repository then I'm not being advised about them. If that is the issue then please advise. I think we can say that the Elasticsearch module is functional and that for the first iteration it is good enough to be merged. We need to ship at some point else this PR becomes a never-ending iteration of things to change in Lucene and in Elasticsearch. If there are fundamental design issues with the Elasticsearch module then I should be advised about it. Never heard about anything related to that yet. So, it is now merged and I'm surprised by your reaction because I thought you would have been happy about it being merged sooner than later. |
What I meant by "branching off new issues (what we've already done)" is that you've already opened issues before to address things that came up in this PR but were out of scope of it, so we agreed not to address it in this one. I.e., not to grow this PR indefinitely. Again, my issue is only what I described above, nothing more, nothing else, and nothing hidden. Yes, naturally I also wanted this to be merged as soon as possible. But it's not the act of the merge itself but rather being ready to be merged as soon as possible. Waiting those perhaps <24 hours that it'd have taken me to check the new changes after you request review wouldn't have been against this. |
Fixes #4316
How to use:
Install Elasticsearch 7.x with Docker compose
Elasticsearch uses a mmapfs directory by default to store its indices. The default operating system limits on mmap counts is likely to be too low, which may result in out of memory exceptions.
https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html
For Docker with WSL2, you will need to persist this setting by using a .wslconfig file.
In your Windows
%userprofile%
directory (typicallyC:\Users\<username>
) create or edit the file.wslconfig
with the following:Then exit any WSL instance,
wsl --shutdown
, and restart.Elasticsearch v7.17.5 Docker Compose file :
elasticsearch.txt
docker-compose up
to deploy Elasticsearch containers.Advice: don't remove this file from its folder if you want to remove all their containers at once later on in Docker desktop.
You should get this result in Docker Desktop app :
Set up Elasticsearch in Orchard Core
Implementation details
Analyzed and Stored Properties are not very meaningful in context of Elasticsearch.
Analyzed
Analyzed is default for strings in Elasticsearch.
Because of automatic mapping, by default, all string fields are indexed twice in Elasticsearch as a "Text" Field and a "Keyword" field.
So we will have a field called ContentItemId(Text) analyzed and another called ContentItemId.Keyword to match on exact values using TermQuery for fields like ContentItemId or emails (Elastic Stores text fields in 2 fields analyzed vs not analyzed, a field ContentItemId.Keyword is created automatically)
Elasticsearch documentation:
https://www.elastic.co/blog/strings-are-dead-long-live-strings
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-index-search-time.html
Stored
Stored is really an overhead and only required if we are processing thousands of large documents.
By default Elastic will store the entire document into a field called _source and retrieves them when asked them from Index itself.
Elasticsearch documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/7.12/search-fields.html
Lucene vs Elasticsearch
Here is a small table to compare Lucene and Elasticsearch (string) types:
DSL Query Syntax
It is suggested to always use MatchQuery instead TermQuery for text fields in Elastic, where fully confident use (.Keyword) fields for exact match with TermQuery. (e.g. matching id, or fields like email, phone number, hostname)
Elasticsearch documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
NEST CheatSheet:
https://github.com/mjebrahimi/Elasticsearch-NEST-CheatSheet-Tutorials/blob/master/README.md
TODO
ContentIndexSettings
drivers to the OrchardCore.Indexing module so that they are common to every Search provider.Query
type withNest.IQuery
in the Elastic Search module. We should not parse the Queries with the OC Lucene Parser.Nest.IQuery
is on par with Elasticsearch Query DSL.Migration
Manual migration to get back Lucene Indices Settings, Deployment plans, and Queries. (Reference only)
Breaking Changes
IndexingConstants changes :
Taxonomies module indexing
You can now access the term ids of a taxonomy field by using "{ContentTypeName}.{FieldName}.Ids".
Queries migration
Elasticsearch maps automatically the data which means that Text fields will always be Analyzed. You can now access the "Stored" value of that Text field by using ".keyword" as a suffix to your field name. This means that you can now use a TermQuery on that ".keyword" field and a MatchQuery on the basic field name.
Permissions
ManageIndexes will be now ManageLuceneIndexes.
Lucene indexation
Notes
There is a new client for Elasticsearch 8.x.
Nest has been renamed to Elastic.Clients.Elasticsearch.
Also, OpenSearch will not work with this PR.
There is an initiative with moving Elastic.Clients.Elasticsearch to OpenSearch here :
https://github.com/opensearch-project/opensearch-net
This means that eventually OpenSearch and Elasticsearch will need to become 2 distinct features.
So to summarize compatibility :
Nest = OpenSearch 1.x, Elasticsearch 7.x, Elasticsearch 8.x (in compatibility mode)
https://github.com/elastic/elasticsearch-net/tree/7.17
Elastic.Client.Elasticsearch (replacing Nest) = Elasticsearch 8.x
https://github.com/elastic/elasticsearch-net
https://www.nuget.org/packages/Elastic.Clients.Elasticsearch/
OpenSearch.Client (Nuget package not released yet) = OpenSearch 2.x
opensearch-project/opensearch-build#2051